Goto

Collaborating Authors

 revisiting semi-supervised learning


Supplementary Materials for SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection 1 Datasets

Neural Information Processing Systems

For the transductive setup, we used the three standard citation network benchmarks, Cora, Cite-seer and Pubmed (Sen et al., 2008). We followed the transductive setup adopted in (Y ang et al., Cora contains 2708 nodes, 5429 edges, 7 classes and 1433 features per node. Citeseer contains 3327 nodes, 4732 edges, 6 classes and 3703 features per node. Critically, testing graphs remain completely unobserved during training. The average number of nodes per graph is 2372.


Revisiting semi-supervised learning in the era of foundation models

arXiv.org Artificial Intelligence

Semi-supervised learning (SSL) leverages abundant unlabeled data alongside limited labeled data to enhance learning. As vision foundation models (VFMs) increasingly serve as the backbone of vision applications, it remains unclear how SSL interacts with these pre-trained models. To address this gap, we develop new SSL benchmark datasets where frozen VFMs underperform and systematically evaluate representative SSL methods. We make a surprising observation: parameter-efficient fine-tuning (PEFT) using only labeled data often matches SSL performance, even without leveraging unlabeled data. This motivates us to revisit self-training, a conceptually simple SSL baseline, where we use the supervised PEFT model to pseudo-label unlabeled data for further training. To overcome the notorious issue of noisy pseudo-labels, we propose ensembling multiple PEFT approaches and VFM backbones to produce more robust pseudo-labels. Empirical results validate the effectiveness of this simple yet powerful approach, providing actionable insights into SSL with VFMs and paving the way for more scalable and practical semi-supervised learning in the era of foundation models.